The Plant Phenome Journal — Latest Matching Preprints

1

Predicting Lodging Severity in Sorghum Breeding Trials Using UAV-Based Photogrammetrically Derived Height Data

Mothukuri, S. R.; Massey-Reed, S. R.; Potgieter, A.; Laws, K.; Hunt, C.; Amuzu-Aweh, E. N.; Cooper, M.; Mace, E.; Jordan, D.

2026-03-30 plant biology 10.64898/2026.03.26.713817 medRxiv

Top 0.1%

16.6%

Show abstract

Lodging in sorghum presents a significant challenge for plant breeders due to the trade-off between lodging resistance and grain yield. Manually measuring lodging across thousands of plots is time-consuming, expensive, and error-prone, making selection for lodging resistance challenging in breeding programs. Unmanned Aerial Vehicle (UAV) derived metrics offer a potential high-throughput, cost-effective alternative for lodging phenotyping. This study developed a framework for predicting plot-level lodging from UAV imagery across 2,675 sorghum breeding plots. Multi-temporal canopy height data were collected at two critical time points: maximum crop height and at manual lodging assessment. Height percentiles were extracted from UAV derived point clouds generated using photogrammetric algorithms. These data were used to develop parametric, non-parametric, and ensemble prediction models, which were evaluated using three statistical metrics. The ensemble model, averaging predictions from all models, achieved the highest accuracy with Pearson correlations of r = 0.80-0.84 and lowest residual mean square error (RMSE=16-18), explaining 64-70% of variation in manual lodging counts. Model diagnostics and iterative refinement, including inspection of UAV imagery and dataset curation, had minimal impact on model performance, demonstrating the robustness of the approach. Model performance was consistent across sites, with minimal effects of stratified sampling on accuracy, confirming the ensemble approach as optimal for plot-level lodging assessment. This study demonstrates that integrated multi-temporal UAV imagery offers a practical alternative to labor-intensive manual evaluation methods by enabling high-throughput lodging assessment suitable for implementation in sorghum breeding programs.

2

Rapid, Non-Destructive Visualization of α-Zein Expression and Grain Protein Concentration in Maize Using the Floury2-RFP Reporter Transgene

Li, C.; Heller, N. J.; Tiskevich, C. J.; Moose, S. P.

2026-05-07 plant biology 10.64898/2026.05.05.723001 medRxiv

Top 0.1%

14.4%

Show abstract

Kernel composition traits in maize, including protein accumulation, are of broad interest. The amount of the most abundant proteins in maize endosperm, the -zeins, can vary dramatically among genotypes and in response to soil nitrogen supply. Targeted reductions in -zein accumulation can improve nitrogen utilization and the nutritional quality of maize grain but have traditionally required expensive and destructive phenotyping methods. The Floury2-RFP (Fl2-RFP) reporter gene enables rapid, non-destructive visualization of -zein accumulation in individual maize kernels under white light. This feature is due to the high expression level programmed by the Fl2 promoter, the stability of zein proteins, and the use of monomeric RFP, which emits fluorescence without the need for multimerization. This study aimed to develop a method to quickly document and quantify Fl2-RFP accumulation using camera or smartphone images of either ears or shelled kernels. Results show images of shelled kernels processed with FIJI software capture the Fl2-RFP reporter phenotype better than images of ears. Fl2-RFP confirms the strong maternal control of -zein accumulation and, like grain protein concentration, responds to soil nitrogen supply. The Fl2-RFP phenotyping pipeline effectively quantified Fl2-RFP accumulation by color features from both camera and smartphone images. Smartphone imaging of Fl2-RFP in a diverse population of inbreds followed by elastic net regression of extracted image features predicted kernel protein concentration, as measured by near-infrared spectroscopy, with moderate accuracy (R2 = 0.68, MAE = 0.76, RMSE = 0.93). The spectral features that were most predictive of kernel protein concentration varied depending on whether the background endosperm color was white or yellow. The integrated analysis of Fl2-RFP intensity and grain protein concentration indicates genetic variation for kernel protein accumulation and N-responsiveness that is distinct from the well-studied -zeins. Our findings highlight the Fl2-RFP reporter gene as a valuable tool for investigating the genetic complexity of grain protein concentration and associated traits in maize.

3

Prediction of late blight severity in a large panel of potato genotypes using low-altitude aerial images and machine learning methods

Loayza, H.; Ninanya, J.; Palacios, S.; Silva, L.; Pujaico Rivera, F.; Rinza, J.; Gastelo, M.; Aponte, M.; Kreuze, J. F.; Lindqvist-Kreuze, H.; Heider, B.; Kante, M.; Ramirez, D. A.

2026-04-09 plant biology 10.64898/2026.04.06.716456 medRxiv

Top 0.1%

12.5%

Show abstract

Potato (Solanum tuberosum L.) is a staple crop crucial to global food security, yet its production is severely threatened by late blight (LB), caused by Phytophthora infestans, one of the most destructive plant diseases worldwide. Breeding programs for LB resistance have traditionally relied on labor-intensive and subjective visual assessments, which limit scalability and consistency, particularly in early-generation trials. Unmanned aerial vehicle (UAV)-based remote sensing combined with machine learning (ML) offers a promising alternative for objective, high-throughput disease phenotyping. This study evaluated the potential of UAV-derived multispectral imagery and ML techniques to estimate LB severity across large and genetically diverse potato breeding populations, comprising 2,745 clones in one trial and 492 accessions in another, conducted in Oxapampa, Pasco, Peru. We compared vegetation index-based approaches with a machine learning framework that integrates K-means clustering and Kernel Ridge Regression (KRR) and assessed their ability to capture genotypic variation and support selection decisions. NDVI consistently showed a strong correlation with visually assessed LB severity, particularly at advanced stages of disease development, enabling objective discrimination between healthy and diseased canopy tissues. However, the KRR-based approach outperformed linear NDVI-based models by capturing nonlinear relationships between spectral responses and disease progression. Estimates of LB severity derived from NDVI and KRR models, expressed as best linear unbiased estimates (BLUEs), showed strong and biologically consistent relationships with the area under the disease progress curve (AUDPC), particularly during later UAV acquisitions. Selection coincidence between UAV-derived estimates and AUDPC-based rankings was substantially higher at intermediate to advanced stages of disease progression, suggesting that UAV assessments at these stages may capture sufficient phenotypic variation to distinguish genotypes. These findings indicate that UAV-based multispectral phenotyping, especially when integrated with ML, provides a practical and scalable approach for assessing LB severity in potato breeding programs while reducing the need for time-consuming field evaluations.

4

Spectral Phenotyping Reveals Time-Specific QTLs in Field-Grown Lettuce

Mehrem, S. L.; Zijl, A.; de Haan, M.; Van den Ackerveken, G.; Snoek, B. L.

2026-03-18 plant biology 10.64898/2026.03.16.711173 medRxiv

Top 0.1%

9.9%

Show abstract

Lettuce (Lactuca sativa) is an important field crop, but our understanding of its phenotypic variation and underlying genetics under natural field conditions remains limited, posing challenges for identifying effective crop breeding targets. Longitudinal hyperspectral phenotyping allows for non-invasive monitoring of crop performance under diverse agricultural conditions. In this study, we used hyperspectral imaging to assess the phenotypic variation of almost 200 different field-grown lettuce varieties, following the same plants from just after seedling- to flowering-stage. With automated image processing, we extracted a wide range of spectral phenotypes related to metabolite content, growth efficiency, and environmental stress responses, creating a multi-dimensional time-resolved data set. Principal component analysis (PCA) revealed the major axes of spectral variation over time, and highlighted differences in spectral patterns among lettuce genotypes. Integrating on-site weather data, we modelled GxE interactions of reflectance, revealing regions of the lettuce vegetation spectrum that are primarily shaped by genotype and/or environment. We estimated phenotypic plasticity in response to time, temperature and rainfall using best linear unbiased predictions (BLUPs), capturing genotype-specific developmental trajectories and responses to the environment. We used genome-wide association studies (GWAS) to identify quantitative trait loci (QTLs) of PC-based, single and BLUP-based phenotypes, disentangling the genetic architecture of spectral lettuce phenotypes from major axes of variation down to single wavelength spectral plasticity. These findings provide new insights into the genome-wide genetic regulation and dynamics of spectral phenotypes in field grown lettuce.

5

Leaf and cluster spectral signatures reveal trait-dependent prediction performance for grapevine cluster architecture and juice quality

Robles-Zazueta, C. A.; Strack, T.; Schmidt, M.; Callipo, P.; Robinson, H.; Vasudevan, A.; Voss-Fels, K.

2026-03-31 plant biology 10.64898/2026.03.27.714894 medRxiv

Top 0.1%

9.0%

Show abstract

Grapevine cluster architecture is a key selection target in breeding programs because it influences disease susceptibility, yield stability and juice quality. High-throughput phenotyping offers a rapid and non-destructive approach to capture biochemical and structural variation in these traits, yet the influence of plant organ reflectance and data partitioning strategies on trait prediction remains poorly understood. In this study, we evaluated how hyperspectral reflectance from different grapevine organs contributes to the prediction of cluster architecture and juice quality traits in two clonal populations of Riesling and Pinot. Using partial least squares regression (PLSR), we assessed the prediction accuracy of eight cluster architecture and six juice quality traits under two data partitioning strategies. Models based on cluster reflectance outperformed those using dry leaf reflectance for most traits, except for pH. Partitioning the dataset by cluster type increased trait variance and improved predictions for number of berries (R{superscript 2} = 0.53), berry diameter (R{superscript 2} = 0.79), and total acidity (R{superscript 2} = 0.48). Visible, red-edge and NIR spectra were most informative regions to predict the traits studied. Together, our results highlight the importance of organ-specific data and appropriate calibration strategies to improve phenomic models for the development of scalable proxies for grapevine improvement. HighlightSpectral phenomics reveals that prediction accuracy in grapevine depends on organ spectral signatures and traits, with cluster reflectance outperforming leaves, informing new phenotyping strategies for breeding improvement.

6

Presymptomatic plant disease detection with PSNet: A low-cost hyperspectral imaging and RGB fusion framework.

Crabb, G. U.; Cevik, V.; Chen, X.; Priest, N. K.; Zhao, Y.

2026-03-04 plant biology 10.64898/2026.03.02.709086 medRxiv

Top 0.1%

8.4%

Show abstract

Plant pathogens cause major yield losses worldwide, threatening food security and livelihoods. Because early infection is difficult to diagnose, management often relies on prophylactic pesticide use, increasing costs and environmental impact. Here we present PSNet, a multimodal framework that fuses hyperspectral imaging with RGB information for presymptomatic plant disease detection, together with a low-cost, portable hyperspectral camera incorporating a 3D-printed housing and optical mounts, costing under {pound}500. We validate the approach using Arabidopsis thaliana infected with the oomycete Albugo candida. Imaging at 2 and 4 days post inoculation, prior to visible symptoms, revealed consistent spectral signatures that distinguished infected from healthy plants, while imaging at 6 days post inoculation captured the transition toward early symptom emergence. The most discriminative spectral regions overlapped wavelengths previously associated with plant responses to biotic stress, supporting the biological plausibility of these signatures. On a four-class task (healthy, 2 dpi, 4 dpi, 6 dpi), PSNet achieved 92.7% overall accuracy and 97.1% accuracy for binary healthy versus infected classification. Together, these results demonstrate that presymptomatic detection is feasible under controlled conditions using low-cost hardware and multimodal learning, underscoring the potential of scalable, multimodal systems for early disease monitoring.

7

Multi-Scale Contextual Attention for Robust Crop and Pest Image Classification

Majid, M.; Tariq, H.; Mumtaz, I.; Kashif, M.

2026-04-28 plant biology 10.64898/2026.04.24.720764 medRxiv

Top 0.1%

6.8%

Show abstract

Image-based crop and pest recognition is considered useful for reducing the delay and cost of manual field scouting, therefore supporting timely intervention in precision-agriculture workflows. However, the real field imagery remains challenging due to the cluttered backgrounds, occlusions, illumination changes, and strong scale variation that are frequently observed across crops. The symptoms are often small or low-contrast, and pests may be partially hidden, which reduces the reliability when the setting is outside controlled environments. A unified multi-class crop-pest/condition recognition framework is presented, where a ResNet-50 backbone is utilized and enhanced with a Multi-Scale Contextual Attention (MSCA) module. The novelty is mainly considered to be achieved through the integration of explicit multi-scale contextual aggregation with lightweight joint channel and spatial attention by means of residual fusion, while the empirical evaluation was kept controlled under a fixed and reproducible protocol. A curated dataset of 21,404 field-style images covering 15 crop and pest/condition classes was compiled, and a leakage-aware fixed split with a held-out test set was adopted to support reproducibility. Augmentation was applied only to the training subset to improve robustness, although the validation data was not augmented in the same manner. On the held-out test set, balanced performance was achieved by the proposed approach, with about 0.93 accuracy and a macro-F1 score close to 0.94 being obtained, while established baselines such as EfficientNet, Vision Transformer, and attention-based CNN models were outperformed under identical evaluation settings. Controlled ablations were used to isolate the contribution of MSCA and augmentation under the same training configuration. These results indicate that lightweight multi-scale contextual attention is effective for crop and pest recognition under realistic field conditions, although some visually similar classes remained difficult.

8

Genotypic and environmental effects on seed coat patterning and nutritional composition in common bean (Phaseolus vulgaris L.)

Bolt, T. M.; Cole, A.; Bains, R.; Tian, L.; Parker, T. A.; Gepts, P.; Palkovic, A.; Bornhorst, G.; Diepenbrock, C. H.

2026-04-16 plant biology 10.64898/2026.04.13.718301 medRxiv

Top 0.1%

6.6%

Show abstract

Common bean (Phaseolus vulgaris L.) is the leading grain legume consumed directly by humans and a primary source of nutrients in many communities. This study utilized common bean genotypes with diverse seed coat phenotypes to investigate genotypic and environmental effects on pigmented seed coat area and seed macronutrient (protein, starch, fat, ash, moisture), anti-nutrient (phytate), and mineral nutrient (iron, zinc, calcium, phosphorus, magnesium, potassium, sodium) profiles. Recombinant inbred lines (RILs) that comprise six phenotypic classes for seed coat patterning and nine commercial cultivars were field-evaluated for multiple years across inland, coastal, and intermountain environments in California. A custom near-infrared spectroscopy calibration improved macronutrient prediction accuracy relative to a pre-existing calibration. Environmental effects on macronutrients were pronounced; the 2022 coastal growing environment was the most distinct, characterized by significantly higher starch and moisture content and significantly lower protein content in the RILs relative to any other environments. Across growing years in the RILs, greater consistency was observed at the inland site, where only protein was significantly different; all macronutrient traits significantly differed within the intermountain site. Certain commercial cultivars largely maintained their relative rank for protein content across environments, indicating consistency of genotypic performance, and Black Nightfall ranked among the highest for iron, zinc, phosphorus, and magnesium. Percent pigmented seed coat area was significantly negatively correlated with both calcium and magnesium concentrations. These results underscore the importance of genotype-by-environment field trials for seed coat patterning, seed nutritional composition, and their interplay, to support breeding of common bean among other grain legumes. HighlightsO_LICustom near-infrared spectroscopy (NIRS) calibration improved prediction accuracies C_LIO_LIEnvironmental effects significantly influenced common bean macronutrient composition C_LIO_LICertain cultivars ranked consistently for macronutrient traits across environments C_LIO_LISeed coat pattern was significantly associated with mineral nutrient concentrations C_LI

9

Progeny differentiation in faba bean using hyperspectral images and machine learning

Schlichtermann, R.-H.; Warnemuende, S.; Tietgen, H.; Welna, G.; Stahl, A.; Wittkop, B.; Snowdon, R.

2026-05-21 genetics 10.64898/2026.05.19.725957 medRxiv

Top 0.1%

6.3%

Show abstract

Though currently a minor crop, faba bean is a promising source of plant-based protein as global diets shift towards more plant-based nutrition. To realise this potential, advances in breeding and cultivation are crucial. To exploit heterosis, faba bean breeding frequently utilises synthetic cultivars, which involves open pollination of inbred lines to produce a mixture of F1 hybrid seeds and self-pollinated offspring. Pure F1 hybrid cultivars are currently unavailable due to unstable cytoplasmic male sterility (CMS) systems. An ability to distinguish F1 seeds from their parental inbreds via characteristics associated with xenia effects could change this. The xenia effect refers to the influence of paternal pollen on seed traits, for example seed weight and cotyledon cells in faba bean. In this study, we exploited the xenia effect captured in hyperspectral imaging data to develop machine learning scenarios for discriminating between parental and F1 seeds of open pollinated synthetic combinations (Syn-1). The hyperspectral data were pre-processed using Savitzky-Golay filtering to reduce noise and smooth the spectra. Various machine learning algorithms were applied, incorporating Bayesian hyperparameter optimisation. The scenarios achieved up to 98.9 % accuracy in separating parental components of Syn-1. When including all seeds, the model achieved 40.7 %, indicating moderate detection and classification performance. As the harmonic mean of precision and recall, the F1 score accounts for both the correctness of F1 seed detections and the completeness with which F1 seeds were detected. While this approach does not yet enable the development of full hybrid cultivars, it paves the way for hybrid-enriched cultivars. These could help to streamline breeding for synthetic cultivars and potentially increase yields, for example by increasing the proportion of F1 hybrid seeds in synthetic cultivars. This study extends knowledge of the xenia effect in faba bean and provides a basis for further research aimed at enhancing breeding methods and productivity.

10

Field and lab phenomics facilitate detection of genetic variation for iron deficiency chlorosis tolerance in sorghum

Cerimele, G.; Kent, M.; Miller, M.; Best, R.; Franks, C.; Kakar, N.; Felderhoff, T.; Sexton-Bowser, S.; Morris, G. P.

2026-04-05 genetics 10.64898/2026.04.01.715717 medRxiv

Top 0.1%

6.3%

Show abstract

Bioavailability of iron, an essential micronutrient to plants, is low in alkaline or calcareous soils, which are prevalent across semi-arid production regions. Breeding efforts to increase tolerance to iron deficiency chlorosis (IDC) in sorghum, a major crop of semi-arid regions, are confounded by spatial variation of stress severity in field trials. Here we developed and validated two high-throughput phenotyping approaches to address this challenge, with multi-spectral aerial imaging in the field and a controlled-environment assay to isolate the effects of iron bioavailability. In the field, severity and uniformity of stress are highly predictive of genetic signals for IDC tolerance (R2 > 0.6 for soil pH metrics and H2). Plot-level data filtering for stress conditions based on control genotypes successfully addresses field spatial variation (unfiltered H2 = 0.18 vs. filtered H2 = 0.4). The controlled-environment assay proxies field stress using iron sources with differential bioavailability, evidenced by high heritability ( H2 = 0.98) and phenotypic differential for hybrid control genotypes that matches field performance. Finally, we show that assay phenotypes are suitable for genome-wide association studies in global germplasm. Together, these field and lab phenomic approaches can be deployed to understand genetics of IDC tolerance and develop crops resilient to alkaline soils. HIGHLIGHTStress severity and uniformity greatly impact detection of genetic signals underlying iron deficiency chlorosis tolerance in sorghum. A controlled-environment assay reduces spatial heterogeneity and improves assessment of tolerance genetics.

11

LIME: a fully automated pipeline for high-throughput quantification of leaf lesions

Tan, D.

2026-05-10 plant biology 10.64898/2026.05.07.723432 medRxiv

Top 0.1%

6.2%

Show abstract

Accurate quantification of leaf lesion severity is essential for plant disease research and phenotyping but is often limited by subjective visual scoring and time-intensive manual image analysis. We present LIME, a fully automated, open-source image analysis pipeline for high-throughput quantification of leaf lesions from disease assay images. LIME integrates zero-shot leaf segmentation using the Segment Anything Model with a convolutional neural network for lesion area estimation. Applied to Arabidopsis thaliana leaves infected with Sclerotinia sclerotiorum, the proposed approach achieved a mean absolute percentage error of 12.9%, comparable to observed intrarater variability in manual scoring. Stratified evaluation across lesion-size groups demonstrated consistent prediction accuracy for small, intermediate, and large lesions, and comparative analysis showed that the deep learning-based model substantially outperformed color-based baseline methods. Under GPU-accelerated execution, LIME processed complete assays containing approximately 200 leaves in 15 minutes, representing an approximate 13-fold reduction in processing time relative to manual annotation. Together, these results indicate that LIME enables objective, reproducible, and scalable quantification of leaf lesion severity in standardized plant pathology assays. The pipeline is released as an open-source tool to support quantitative phenotyping studies.

12

LeafContourEFD: a reproducible workflow for elliptic Fourier analysis with orientation normalization and lateral asymmetry

Konrai, K.; Ito, R.; Sunayama, S.; Omura, K.; Isagi, Y.; Kitajima, K.; Onoda, Y.

2026-04-20 plant biology 10.64898/2026.04.16.718881 medRxiv

Top 0.1%

6.2%

Show abstract

PremiseElliptic Fourier analysis is widely used to quantify leaf shape variation, but inconsistent normalization and orientation alignment can introduce biologically irrelevant variation. In addition, a reproducible workflow from raw images to normalized elliptic Fourier descriptors (EFDs) is still lacking. Methods and ResultsWe developed LeafContourEFD, a GUI application for reproducible leaf morphometrics. It integrates image segmentation, contour extraction, EFD calculation, and an extended normalization framework, termed oriented true EFD normalization, based on a user-defined biological reference axis. Analyses of Quercus serrata, Q. crispula, and Triadica sebifera showed that existing normalization methods can introduce orientation-related variance when the first-harmonic major axis does not match the leaf base-to-tip axis. In contrast, oriented true normalization removed these artifacts, yielding clearer shape transitions along principal components allowing shape variation among leaves to be captured while preserving biologically meaningful lateral asymmetry. ConclusionsLeafContourEFD improves interpretability and reproducibility in outline-based morphometrics and provides transparent outputs and metadata for data sharing and cross-study comparisons.

13

Unlocking the potential of Capsicum Germplasm Collections for Climate Resilience and Fruit Quality

Halpin-McCormick, A.; Nalla, M. K.; Radlicz, Z.; Zhang, A.; Fumia, N.; Lin, T.-h.; Lin, S.-w.; Wang, Y.-w.; Zohoungbogbo, H. P. F.; Wang, D. R.; Runck, B.; Gore, M. A.; Kantar, M. B.; Barchenger, D. W.

2026-03-28 plant biology 10.64898/2026.03.25.714358 medRxiv

Top 0.1%

4.9%

Show abstract

Climate change increasingly threatens global Capsicum (pepper) production. Accelerating the deployment of climate-resilient cultivars requires effective use of genetic diversity conserved in genebanks. We implement a "turbocharging" strategy in Capsicum by integrating genome-wide association studies and genomic prediction in a core collection (n = 423), followed by genomic prediction across the global collection (n = 10,250) using the core as a training population. We generated genomic estimated breeding values (GEBVs) for 31 high-accuracy traits (r > 0.5) encompassing hyperspectral phenotypes (heat/control), agronomic performance (heat/control) and fruit quality. To enhance accessibility and decision-making, we developed a large language model (LLM) integrated application that enables flexible, preference-based selection of candidates. By narrowing the parental decision space, this framework streamlines screening of large germplasm collections while balancing climate resilience, quality attributes and market demands. Our approach provides a scalable decision-support system to accelerate climate-resilient Capsicum breeding and maximize global genetic resources.

14

Potato yield can be predicted by using drone-captured and environmental measurements early in the growing season

Vizintin, A.; Zagorscak, M.; Turk, E.; Kriznik, M.; Petek, M.; Stare, K.; Wurzinger, B.; Shaikh, M. A.; Heselmans, G.; Sollinger, J.; Lindenbergh, P.-J.; Graveland, R.; Oome, S.; Prat, S.; Bachem, C.; Teige, M.; Doevendans, B.; Ribarits, A.; Zrimec, J.; Gruden, K.

2026-03-11 plant biology 10.64898/2026.03.09.709817 medRxiv

Top 0.1%

4.8%

Show abstract

Accurate pre-harvest prediction of crop yield informs variety selection, optimizes management, and accelerates breeding. As potato is the worlds leading non-grain staple, here we evaluate a diverse panel of varieties in a three-year field trial across five European locations. Canopy development and environmental parameters are monitored throughout the growing season using drone-based imaging, in-field sensors and gene expression measurements, while tuber yield and quality traits are quantified at harvest. We show that these data enable the identification of climate-resilient, high-yielding genotypes and support the development of machine learning models that explain over 80% of yield variance in independent test sets. Strikingly, measurements collected within the first two months after planting achieve predictive performance comparable to models trained on full-season data. Model interrogation further shows that simplified five-parameter linear equations capture over 70% of yield variability. Our framework thus demonstrates the potential of integrative field phenotyping and data-driven modeling to improve variety selection across heterogeneous environments. Significance statementThe ability to predict harvest crop yields from pre-harvest measurements can enable farmers and growers to make informed decisions on variety selection and management practices, while breeders can benefit from accelerated breeding cycles. We perform a panel of field trials with potato, the no. 1 global non-grain staple, across varying conditions and locations, recording various growth- and climate-related data, including gene expression, and post-harvest yield and quality of tubers. We demonstrate the potential of the field trial data to facilitate the analysis and selection of best-performing varieties across diverse conditions and locations, and to revolutionize farming by enabling early (already within 2 months) and straightforward (only a couple of key measured variables) yield predictions with high accuracy.

15

A standard area diagram for potato common scab: comparable performance of image- and object-based validation

Cazon, L. I.; Paredes, J. A.; Quiroga, M.; Guzman, F.

2026-03-20 plant biology 10.64898/2026.03.18.712681 medRxiv

Top 0.1%

4.8%

Show abstract

Potato common scab (Streptomyces sp.) is an economically important disease that reduces the quality and market value of tubers. A key aspect in developing management strategies involves accurately quantifying the disease. Due to the three-dimensional nature of the tuber and the heterogeneous distribution of lesions across its surface, visual estimates of severity can be challenging. Therefore, the objectives of this study were to develop and validate a standard area diagram (SAD) for estimating common scab severity on potato tubers and to compare validation outcomes obtained using real tubers and digital images. A SAD comprising six severity levels (from 1.3 to 66.8%) was developed based on image analysis of naturally infected tubers. Validation was conducted using two complementary approaches in which inexperienced raters evaluated either real potato tubers or digital images of the same tubers under unaided and aided conditions. Accuracy, bias components, and inter-rater reliability were quantified using absolute error metrics, Lins concordance correlation coefficient, intraclass correlation coefficients, and overall concordance correlation coefficients. Use of the SAD significantly improved accuracy, reduced systematic bias, and increased inter-rater reliability across both validation approaches. No significant differences were detected between assessments conducted on real tubers and images, although image-based evaluations showed a slight, non-significant tendency toward reduced scale and location bias under aided conditions. These results demonstrate that a dimension-aware SAD integrating information across the full tuber surface enhances the reliability and reproducibility of visual severity assessments and supports the use of image-based evaluations for training, large-scale surveys, and remote or collaborative applications involving three-dimensional plant organs.

16

Identifying water stress response haplotypes in barley using latent environmental covariates

Aldiss, Z.; Brunner, S.; Heidariask, B.; Chenu, K.; Van Haeften, S.; Baraibar, S.; Ganesgalingam, D.; Moody, D.; Hickey, L.; Lam, Y.

2026-05-07 plant biology 10.64898/2026.05.04.722807 medRxiv

Top 0.1%

4.7%

Show abstract

PurposeGenotype-by-environment (G x E) interactions represent a major obstacle to increasing genetic gain in crop breeding, with the underlying physiological drivers often remaining obscured within conventional statistical models. This case study presents a novel framework that transforms the latent factors from Factor Analytic (FA) multi-environment trial (MET) models into heritable quantitative traits, enabling the genetic dissection of adaptive response patterns. MethodsA Factor Analytical Linear Mixed Model (FA-LMM) was fit to plot-level yield data for 1,036 barley genotypes across eight Australian trials. ResultsCorrelation of the factor loadings with APSIM-simulated environmental covariates demonstrated that the second latent factor FA2 was strongly correlated with the Water Stress Index (r = -0.83) during the critical flowering period, establishing water availability as the main biological axis of crossover Gx E. Genotypic scores for the derived traits, Overall Performance (OP) and Water Stress Response (WSR), were subjected to high-resolution haplotype-based mapping using local Genomic Estimated Breeding Values (GEBV). ConclusionThis analysis successfully identified major genomic regions that accounted for a substantial proportion of the additive genetic variance. Gene Ontology enrichment of candidate genes within the top haploblocks implicated fundamental pathways related to energy homeostasis, root development, and stress response, with notable candidates including FTsH11, BPS1, and TDP1. The distribution of favourable Haplotypes of Interest (HOI) in elite cultivars suggested a historical signature of inadvertent selection for these adaptive mechanisms. This framework provides an explicit bridge between statistical modelling and functional genomics, offering breeders actionable genetic targets for accelerated development of climate-resilient cereals.

17

Efficient genomic prediction at reduced training size and moderate marker density in an expanded aus-NAM population of rice

Kitony, J. K.; Reyes, V. P.; Sunohara, H.; Tasaki, M.; Yamasaki, M.; Mori, J.-i.; Shimazu, A.; Nishiuchi, S.; Michael, T. P.; Doi, K.

2026-05-01 plant biology 10.64898/2026.04.28.721500 medRxiv

Top 0.1%

4.4%

Show abstract

Genomic selection (GS) can accelerate genetic gain in crops, but its effectiveness depends on training population design and marker density. Nested association mapping (NAM) populations provide a structured framework that captures broad allelic diversity within a controlled genetic background. Here, we evaluated genomic prediction (GP) and genome-wide association study (GWAS) performance in an expanded aus-NAM population of rice comprising 1,818 recombinant inbred lines across 14 families and 11 agronomic traits, using genotyping-by-sequencing (GBS) markers and projected whole-genome sequence variants. Prediction accuracy plateaued at moderate marker densities ([~]20k SNPs) and with training populations of [~]500 lines ([~]40-60% of the available pool), with trait heritability emerging as the strongest determinant of predictive performance rather than model choice or marker density. In contrast, GWAS resolution continued to improve with increasing marker density, enabling detection of additional loci, including a chromosome 12 locus associated with heading date, while consistently recovering well-characterized genes such as EARLY HEADING DATE 1 (Ehd1) and SEMIDWARF 1 (SD1). These contrasting patterns indicate that GP reaches near-optimal performance once genome-wide variation is adequately represented, whereas GWAS benefits from higher marker density through improved locus resolution. The present study establishes a benchmark for implementing breeding programs involving japonica/indica crosses using GP in a single environment.

18

Easy to use and low cost leaf disease quantification workflow using Ilastik

Prouvost, A.; Connesson, L.; Le Gourrierec, T.; Freville, H.; David, J.; Plessis, C.; Magnier, B.

2026-05-16 plant biology 10.64898/2026.05.14.719059 medRxiv

Top 0.1%

3.9%

Show abstract

Accurate and reproducible assessment of foliar disease severity is essential for evaluating the performance of heterogeneous plant communities and understanding host-pathogen interactions. However, traditional visual scoring methods remain subjective, with limited precision, and difficult to scale in large phenotyping experiments. Here, we present a semi-automated image analysis workflow designed to quantify multiple foliar disease symptoms simultaneously on wheat flag leaves sampled from varietal mixtures. The workflow combines three methodological components: (i) a standardized protocol for leaf sampling and imaging, (ii) supervised machine learning segmentation using Random Forest implemented in Ilastik to classify multiple symptoms (powdery mildew and yellow rust), and (iii) a graphical user interface facilitating pipeline deployment by non-specialist operators. To evaluate the influence of image representation on classification performance, four color spaces (RGB, HSV, HLS, LAB) were systematically compared. The approach was validated using images of durum wheat flag leaves collected from a field experiment assessing eight-way varietal mixtures under natural fungal pressure. Cross-validation against manually annotated images demonstrated high segmentation accuracy across all symptom. Comparison among color spaces revealed only minor differences in performance. Overall, this workflow offers a cost-effective, annotation-efficient and reproducible alternative to deep learning approaches, leveraging open-source and actively maintained tools while requiring limited training data and enabling objective, reproducible and scalable disease phenotyping.

19

Reaction Norm Modeling of High-Dimensional Genomic and Environmental Data Improves Prediction Accuracy in Winter Wheat

Acharya, S. R.; Garcia-Abadillo, J.; Lyerly, J.; Brown-Guedira, G.; Jarquin, D.; Bandillo, N.

2026-05-08 genetics 10.64898/2026.05.05.722758 medRxiv

Top 0.1%

3.9%

Show abstract

Genomic prediction models that account genotype-by-environment (GxE) have the potential to accelerate the rate of genetic gain for yield and agronomic performance, yet relatively few studies have applied GxE prediction in public soft red winter wheat (Triticum aestivum) breeding programs. In this study, we extended a reaction norm-based genomic prediction framework by integrating weather-based environmental covariates to more effectively capture genotype- environment interactions. Key agronomic traits, including seed yield, plant height, test weight, and heading date, were evaluated across 33 environments (location-year) using over 3,200 breeding lines from the North Carolina State University small grains breeding program. Multiple genomic prediction models were compared using several cross-validation (CV) schemes representing common breeding scenarios. Across traits, the reaction norm M5 model, which incorporates both GxE and genotype-by-environmental covariate interactions (GxO), achieved the highest prediction accuracy (PA) in CV2 (predicting incomplete field trials) and CV1 for yield and test weight (predicting new lines). The highest PA was observed for test weight under CV2 (0.54) and for yield under CV1 (0.41). Under CV0 (predicting new environments), the M3 model incorporating GxE produced highest PA across traits, with the greatest accuracy for plant height (0.45), although differences among M2, M3, and M4 were small. Prediction under CV00 (predicting new lines in new environments) remained more challenging, with PA values 0.10 - 0.20 across traits. Overall, our results demonstrate that integrating environmental covariates into genomic prediction models can improve predictive performance across diverse wheat-growing environments in North Carolina, supporting their utility for applied breeding efforts. CORE IDEASO_LIIntegrating genotype-by-environment (GxE) interactions with environmental covariates improves prediction accuracy across environments. C_LIO_LIModel performance varies by prediction scenario, with different approaches performing best for new lines, incomplete trials, or new environments. C_LIO_LIPrediction of new lines in new environments remains challenging. C_LI PLAIN LANGUAGE SUMMARYThis study explores how adding environmental information to genomic prediction models can improve prediction accuracy in a public winter wheat breeding program. Using data from multi-environment trials conducted across diverse conditions in North Carolina, we evaluated statistical models that capture how different wheat lines respond to changing environments. By incorporating weather data, we improved the ability to predict performance across locations and years. These findings provide practical insights for refining selection strategies and accelerating genetic gain in wheat breeding.

20

Reliable quantification of multiplexed genetically encoded biosensors responsiveness in plant tissues

Levak, V.; Zupanic, A.; Pogacar, K.; Marondini, N.; Stare, K.; Arnsek, T.; Fink, K.; Gruden, K.; Lukan, T.

2026-03-16 plant biology 10.64898/2026.03.13.711581 medRxiv

Top 0.1%

3.8%

Show abstract

Genetically encoded biosensors are one of essential tools in biological research. They enable visualization of molecules of interest from the subcellular level to entire organism level in vivo and can be used to monitor presence of small molecules, gene expression, protein activity, and protein degradation. However, multiplexing fluorescent biosensors in plants is notoriously difficult due to signal bleed-through and strong autofluorescence from chlorophyll. In this study, we investigated the potential of multiplexing biosensors based on the selection of reporter fluorescent proteins. We characterized the emission spectra, fluorescence lifetimes, and relative brightness of diverse fluorescent proteins in plant leaves. We show that selected proteins exhibit comparable brightness, supporting their use in co-expression experiments and reliable quantification of individual signals. To separate overlapping signals, we applied two different linear unmixing approaches and compared them to results obtained without unmixing. We identified channel separation unmixing approach as the most suitable for biosensors. Additionally, we show how unmixing with the selected approach can be applied to separate autofluorescence and we validated this approach in virus-infected cells by following organelle dynamics in vivo. Overall, our work demonstrates that biosensors can be multiplexed, even when their emission spectra overlap. Significance statementMultiplexing genetically encoded biosensors in plants has been limited by overlapping fluorescent signals and strong autofluorescence. This study presents an optimized framework for linear unmixing and provides a MATLAB-based organelle segmentation tool, allowing precise quantification of multiple fluorescent reporters in vivo and advancing real-time visualization of complex cellular processes in plants.